Hands-on Exercise 03b

Author

Andrea Yeo

Modified

January 31, 2025

With the assistance of ChatGPT

3b. Programming animated statistical graphics with R

3.1 Overview

In this Hands-on exercise 03b, we will learn how to create engaging animated data visualizations using the gganimate and plotly R packages. We will also learn how to reshape data with the tidyr package and process, wrangle, and transform data with the dplyr package.

Overall, animated graphics not only captivate the audience but also leave a lasting impression, making them an effective tool for visually-driven data storytelling.

3.1.1 Basic concepts of animation

Animations in data visualization are created by generating a series of individual plots, each representing a subset of the data. These plots are then stitched together into sequential frames to create the illusion of motion, similar to a flipbook or traditional cartoons. The animated effect is driven by the transitions between data subsets over time.

3.1.2 Terminology

Before creating an animated statistical graph, it’s important to understand key concepts:

  • Frame: each frame represents a specific point in time or category, updating the graph’s data points as it changes.
  • Animation attributes: control the animation’s behavior, such as frame duration, easing functions for transitions, and whether the animation starts from the current frame or resets to the beginning.
To read

Consider whether the effort is justified before creating animated graphs. For exploratory data analysis, animations may not be worth the time. However, in presentations, well-placed animations can significantly enhance audience engagement compared to static visuals.

3.2 Getting started

3.2.1 Loading the R packages

First, we install and load the folliwing R packages:

  • plotly: An R library for creating interactive statistical graphs.
  • gganimate: A ggplot2 extension for making animated graphs
  • gifski: A tool for converting video frames into high-quality animated GIFs using advanced palette and dithering techniques.
  • gapminder: A dataset excerpt from Gapminder.org, often used for its country_colors schemes.
  • tidyverse: A collection of modern R packages designed for data science tasks, including analysis, communication, and creating static graphs.
Code
pacman::p_load(readxl, gifski, gapminder,
               plotly, gganimate, tidyverse)

3.2.2 Importing the data

In this hands-on exercise, the Data worksheet from GlobalPopulation Excel workbook will be used.

Importing Data worksheet from GlobalPopulation Excel workbook by using appropriate R package from tidyverse family.

Note
  • read_xls(): Imports Excel worksheets, readxl package
  • mutate_each_():Converts all character data types to factors, dplyr package
  • mutate(): Converts the Year field values to integers, dplyr package
Code
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_each_(funs(factor(.)), col) %>%
  mutate(Year = as.integer(Year))
Warning
  • Warning: mutate_each_() was deprecated in dplyr 0.7.0.
  • Warning: funs() was deprecated in dplyr 0.8.0.

We will re-write the code by using mutate_at() as shown below.

’mutate(across())` can be used to derive the same outputs.

Code
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate_at(col, as.factor) %>%
  mutate(Year = as.integer(Year))
Code
col <- c("Country", "Continent")
globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data") %>%
  mutate(across(col, as.factor)) %>%
  mutate(Year = as.integer(Year))

3.2.3 Inspecting the data

Code
globalPop <- read_xls("data/GlobalPopulation.xls", sheet = "Data")

We will check the dataset using below

  • glimpse(): provides a transposed overview of a dataset, showing variables and their types in a concise format.
  • head(): displays the first few rows of a dataset (default is 6 rows) to give a quick preview of the data.
  • summary(): generates a statistical summary of each variable, including measures like mean, median, and range for numeric data.
  • duplicated():Returns a logical vector indicating which elements or rows in a vector or data frame are duplicates.
  • colSums(is.na()): Counts the number of missing values (NA) in each column of the data frame.
Code
glimpse(globalPop)
Rows: 6,204
Columns: 6
$ Country    <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
$ Year       <dbl> 1996, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, 2014,…
$ Young      <dbl> 83.6, 84.1, 84.6, 85.1, 84.5, 84.3, 84.1, 83.7, 82.9, 82.1,…
$ Old        <dbl> 4.5, 4.5, 4.5, 4.5, 4.5, 4.6, 4.6, 4.6, 4.6, 4.7, 4.7, 4.7,…
$ Population <dbl> 21559.9, 22912.8, 23898.2, 25268.4, 28513.7, 31057.0, 32738…
$ Continent  <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "As…
Code
head(globalPop)
# A tibble: 6 × 6
  Country      Year Young   Old Population Continent
  <chr>       <dbl> <dbl> <dbl>      <dbl> <chr>    
1 Afghanistan  1996  83.6   4.5     21560. Asia     
2 Afghanistan  1998  84.1   4.5     22913. Asia     
3 Afghanistan  2000  84.6   4.5     23898. Asia     
4 Afghanistan  2002  85.1   4.5     25268. Asia     
5 Afghanistan  2004  84.5   4.5     28514. Asia     
6 Afghanistan  2006  84.3   4.6     31057  Asia     
Code
summary(globalPop)
   Country               Year          Young             Old       
 Length:6204        Min.   :1996   Min.   : 15.50   Min.   : 1.00  
 Class :character   1st Qu.:2010   1st Qu.: 25.70   1st Qu.: 6.90  
 Mode  :character   Median :2024   Median : 34.30   Median :12.80  
                    Mean   :2023   Mean   : 41.66   Mean   :17.93  
                    3rd Qu.:2038   3rd Qu.: 53.60   3rd Qu.:25.90  
                    Max.   :2050   Max.   :109.20   Max.   :77.10  
   Population         Continent        
 Min.   :      3.3   Length:6204       
 1st Qu.:    605.9   Class :character  
 Median :   5771.6   Mode  :character  
 Mean   :  34860.9                     
 3rd Qu.:  22711.0                     
 Max.   :1807878.6                     
Code
globalPop[duplicated(globalPop),]
# A tibble: 0 × 6
# ℹ 6 variables: Country <chr>, Year <dbl>, Young <dbl>, Old <dbl>,
#   Population <dbl>, Continent <chr>
Code
colSums(is.na(globalPop))
   Country       Year      Young        Old Population  Continent 
         0          0          0          0          0          0 

3.3 Animated data visualisation: gganimate methods

gganimate extends ggplot2 by adding animation-specific grammar, allowing plots to dynamically change over time with customizable transitions.

  • transition_*(): Defines how data is distributed and related over time.
  • view_*(): Controls how positional scales change during the animation.
  • shadow_*(): Determines how data from other time points is displayed at a given moment.
  • enter_*() / exit_*(): Specifies how new data enters and old data exits during the animation.
  • ease_aes(): Adjusts how aesthetics transition smoothly over time.

3.3.1 Building a static population bubble plot

The code below uses the basic ggplot2 function to create a static bubble plot.

Code
ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') 

3.3.2 Building the animated bubble plot

The code below uses the two functions to create an animated bubble plot. - transition_time() of gganimate is usedto create transition through distinct states in time (i.e.: Year) - ease_aes() is used to control easing of aesthetics. The default is linear. Other methods are: quadratic, cubic, quartic, quintic, sine, circular, exponential, elastic, back, and bounce.

Code
ggplot(globalPop, aes(x = Old, y = Young, 
                      size = Population, 
                      colour = Country)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(title = 'Year: {frame_time}', 
       x = '% Aged', 
       y = '% Young') +
  transition_time(Year) +       
  ease_aes('linear')          

3.4 Animated data visualisation: plotly

In the Plotly R package, both ggplotly() and plot_ly() enable keyframe animations using the frame argument or aesthetic. Additionally, they support the ids argument or aesthetic to ensure smooth transitions for objects with the same ID, promoting object constancy during animations.

3.4.1 Building an animated bubble plot: ggplotly() method

Note
  • The animated bubble plot will includes a play/pause button and a slider component for controlling the animation
Code
gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7, 
             show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young')

ggplotly(gg)
Note
  • A static bubble plot is created using ggplot2 functions and saved as an R object named gg.
  • The ggplotly() function is then used to convert this static plot into an animated SVG object.
Warning
  • You will notice that the show.legend = FALSE argument was used, but the legend still appears on the plot. To overcome this problem, theme(legend.position=none) should be used as shown in the plot and code below.

3.4.2 Building an animated bubble plot: ggplotly() method - without legend

Code
gg <- ggplot(globalPop, 
       aes(x = Old, 
           y = Young, 
           size = Population, 
           colour = Country)) +
  geom_point(aes(size = Population,
                 frame = Year),
             alpha = 0.7) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  labs(x = '% Aged', 
       y = '% Young') + 
  theme(legend.position='none')

ggplotly(gg)

3.5 Reference

3.6 Overall reference

Key takeaways
  • Learnt the Importance of Animated Graphics
  • Packages and Tools used: gganimate, plotly, tidyr, and dplyr
  • Learnt how to create animated visualizations - Static Bubble Plot, Animating with gganimate, and Animating with plotly

5.0 Further exploration

  1. To explore animated plot that shows how Singapore’s population has changed over the years.

Observations:

  • Reflect a society transitioning to an aging population
  • Steady Population Growth Until 2030, but population decline after 2030.
  • By 2050, the population drops to 4,635.1, marking a decrease of approximately 9.6% from the peak.

Code
# Prepare the dataset and filter for 'Singapore'
singapore_data <- globalPop %>%
  filter(Country == "Singapore") %>%
  mutate(Year = as.integer(Year), Population = as.numeric(Population))

p <- ggplot(singapore_data, aes(x = Year, y = Population, group = 1)) +
  # Line showing the trajectory of population over time
  geom_line(color = "blue", linewidth = 1) +
  # Moving dot to emphasize animation
  geom_point(color = "red", size = 4) +
  labs(title = "Population Change in Singapore", 
       subtitle = "Year: {frame_time}",
       x = "Year", 
       y = "Population") +
  theme_minimal() +
  transition_reveal(Year) +  # Reveals the line over time
  ease_aes('linear')

p
  1. To explore static bubble plot for the sum of population across continent

Observations:

  • Asia has the highest population - largest bubble
  • Africa has a significantly large population - second largest bubble
  • Oceania has the smallest population - smallest bubble

Code
library(dplyr)

globalPop <- read_xls("data/GlobalPopulation.xls",
                      sheet="Data")
Code
# Process data for all continents
data_continent <- globalPop %>%
  group_by(Year, Continent) %>%
  summarise(TotalPopulation = sum(Population, na.rm = TRUE), .groups = 'drop')
Code
# Create a static bubble plot
ggplot(data_continent, aes(x = Continent, y = TotalPopulation, size = TotalPopulation, color=Continent)) +
  geom_point(alpha = 0.7) +
  scale_size_area(max_size = 15) +
  labs(
    title = "Total Population by Continent",
    x = "Continent",
    y = "Total Population (Thousands)"
  ) +
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  coord_flip()  # Flip coordinates for better readability
  1. To explore animated plot that visualizes the sum of population growth by continent over the years.

Observations:

  • Asia has the highest population growth - trajectory is steep and significantly outpaces other continents
  • Africa’s population is also increasing rapidly, showing a strong upward trend.
  • Europe, North America, South America show slow growth, with relatively flat trends
  • Oceania has the lowest population, maintaining a nearly constant trend.

Code
# Process data for all continents
data_continent <- globalPop %>%
  group_by(Year, Continent) %>%
  summarise(TotalPopulation = sum(Population, na.rm = TRUE), .groups = 'drop')
Code
# Create an animated plot for population growth by continent
ggplot(data_continent, aes(x = Year, y = TotalPopulation, color = Continent, group = Continent)) +
  geom_line(size = 1) +
  geom_point(size = 3) +
  labs(
    title = "Population Growth by Continent Over the Years",
    x = "Year",
    y = "Total Population (Thousands)",
    color = "Continent"
  ) +
  theme_minimal() +
  transition_reveal(Year)